102 research outputs found

    Interpretability and Generalization of Deep Low-Level Vision Models

    Get PDF
    The low-level vision task is an important type of task in computer vision, including various image restoration tasks, such as image super-resolution, image denoising, image deraining, etc. In recent years, deep learning technology has become the de facto method for solving low-level vision problems, relying on its excellent performance and ease of use. By training on large amounts of paired data, it is anticipated that deep low-level vision models can learn rich semantic knowledge and process images in an intelligent manner for real-world applications. However, because our understanding of deep learning models and low-level vision tasks is not deep enough, we cannot explain the success and failure of these deep low-level vision models. Deep learning models are widely acknowledged as ``black boxes'' due to their complexity and non-linearity. We cannot know what information the model used when processing the input or whether it learned what we wanted. When there is a problem with the model, we cannot identify the underlying source of the problem, such as the generalization problem of the low-level vision model. This research proposes interpretability analysis of deep low-level vision models to gain a more profound insight into the deep learning models for low-level vision tasks. I aim to elucidate the mechanisms of the deep learning approach and to discern insights regarding the successes or shortcomings of these methods. This is the first study to perform interpretability analysis on the deep low-level vision model

    Rethinking the Pipeline of Demosaicing, Denoising and Super-Resolution

    Full text link
    Incomplete color sampling, noise degradation, and limited resolution are the three key problems that are unavoidable in modern camera systems. Demosaicing (DM), denoising (DN), and super-resolution (SR) are core components in a digital image processing pipeline to overcome the three problems above, respectively. Although each of these problems has been studied actively, the mixture problem of DM, DN, and SR, which is a higher practical value, lacks enough attention. Such a mixture problem is usually solved by a sequential solution (applying each method independently in a fixed order: DM →\to DN →\to SR), or is simply tackled by an end-to-end network without enough analysis into interactions among tasks, resulting in an undesired performance drop in the final image quality. In this paper, we rethink the mixture problem from a holistic perspective and propose a new image processing pipeline: DN →\to SR →\to DM. Extensive experiments show that simply modifying the usual sequential solution by leveraging our proposed pipeline could enhance the image quality by a large margin. We further adopt the proposed pipeline into an end-to-end network, and present Trinity Enhancement Network (TENet). Quantitative and qualitative experiments demonstrate the superiority of our TENet to the state-of-the-art. Besides, we notice the literature lacks a full color sampled dataset. To this end, we contribute a new high-quality full color sampled real-world dataset, namely PixelShift200. Our experiments show the benefit of the proposed PixelShift200 dataset for raw image processing.Comment: Code is available at: https://github.com/guochengqian/TENe

    Networks are Slacking Off: Understanding Generalization Problem in Image Deraining

    Full text link
    Deep deraining networks, while successful in laboratory benchmarks, consistently encounter substantial generalization issues when deployed in real-world applications. A prevailing perspective in deep learning encourages the use of highly complex training data, with the expectation that a richer image content knowledge will facilitate overcoming the generalization problem. However, through comprehensive and systematic experimentation, we discovered that this strategy does not enhance the generalization capability of these networks. On the contrary, it exacerbates the tendency of networks to overfit to specific degradations. Our experiments reveal that better generalization in a deraining network can be achieved by simplifying the complexity of the training data. This is due to the networks are slacking off during training, that is, learning the least complex elements in the image content and degradation to minimize training loss. When the complexity of the background image is less than that of the rain streaks, the network will prioritize the reconstruction of the background, thereby avoiding overfitting to the rain patterns and resulting in improved generalization performance. Our research not only offers a valuable perspective and methodology for better understanding the generalization problem in low-level vision tasks, but also displays promising practical potential

    Recursive Generalization Transformer for Image Super-Resolution

    Full text link
    Transformer architectures have exhibited remarkable performance in image super-resolution (SR). Since the quadratic computational complexity of the self-attention (SA) in Transformer, existing methods tend to adopt SA in a local region to reduce overheads. However, the local design restricts the global context exploitation, which is crucial for accurate image reconstruction. In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images. Specifically, we propose the recursive-generalization self-attention (RG-SA). It recursively aggregates input features into representative feature maps, and then utilizes cross-attention to extract global information. Meanwhile, the channel dimensions of attention matrices (query, key, and value) are further scaled to mitigate the redundancy in the channel domain. Furthermore, we combine the RG-SA with local self-attention to enhance the exploitation of the global context, and propose the hybrid adaptive integration (HAI) for module integration. The HAI allows the direct and effective fusion between features at different levels (local or global). Extensive experiments demonstrate that our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively. Code is released at https://github.com/zhengchen1999/RGT.Comment: Code is available at https://github.com/zhengchen1999/RG
    • …
    corecore